A Dialect Distance Metric Based on String and Temporal Alignment

نویسندگان

  • Thomas Kisler
  • Uwe D. Reichel
چکیده

The Levenshtein distance is an established metric to represent phonological distances between dialects. So far, this metric has usually been applied on manually transcribed word lists. In this study we introduce several extensions of the Levenshtein distance by incorporating probabilistic edit costs as well as temporal alignment costs. We tested all variants for compliance with the axioms that withindialect utterance pairs are phonologically more similar than across-dialect ones. In contrast to former studies we are not applying the metrics on preselected, prototypical word lists but on real connected speech data which was automatically segmented and labeled. It turned out, that the transcription edit distances already performed well in reflecting the difference between withinand across-dialect comparisons, and that the adding of a temporal component rather weakens the performance of the metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational dialectology in Irish Gaelic

Dialect groupings can be discovered objectively and automatically by cluster analysis of phonetic transcriptions such as those found in a linguistic atlas. The first step in the analysis, the computation of linguistic distance.between each pair of sites, can be computed as Levenshtein distance between phonetic strings. This correlates closely with the much more laborious technique of determinin...

متن کامل

Adaptive String Distance Measures for Bilingual Dialect Lexicon Induction

This paper compares different measures of graphemic similarity applied to the task of bilingual lexicon induction between a Swiss German dialect and Standard German. The measures have been adapted to this particular language pair by training stochastic transducers with the ExpectationMaximisation algorithm or by using handmade transduction rules. These adaptive metrics show up to 11% F-measure ...

متن کامل

Evaluating the Pairwise String Alignment of Pronunciations

Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments...

متن کامل

Evaluation Of String Distance Algorithms For Dialectology

We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-grams— although we cannot identify ...

متن کامل

Computing alignment plots efficiently

Dot plots are a standard method for local comparison of biological sequences. In a dot plot, a substring to substring distance is computed for all pairs of fixed-size windows in the input strings. Commonly, the Hamming distance is used since it can be computed in linear time. However, the Hamming distance is a rather crude measure of string similarity, and using an alignment-based edit distance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013